Sparse geometry for super resolution video processing

ABSTRACT

In a method of analyzing an input video sequence, pixels of synthesized images of an output video sequence are associated with respective directions of regularity belonging to a predefined set of directions. A first subset of candidate directions is determined from the predefined set of directions for a region of a first image of the output sequence. For a corresponding region of a second synthesized image of the output sequence following the first image, a second subset of candidate directions is determined from the predefined set of directions, based on images of the input sequence and the first subset of candidate directions. The directions of regularity for pixels of this region of the second synthesized image are detected from the second subset of candidate directions. The recursive determination of the subsets of candidate directions provides a sparse geometry for efficiently analyzing the video sequence.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation under 35 U.S.C. §120 of U.S. application Ser. No. 12/812,201, titled “SPARSE GEOMETRY FOR SUPER RESOLUTION VIDEO PROCESSING,” filed on Jul. 8, 2010, which is hereby incorporated by reference in its entirety. U.S. patent application Ser. No. 12/812,201 is a National Stage application under 35 U.S.C. §371 of International Application PCT/IB2008/051270, titled “SPARSE GEOMETRY FOR SUPER RESOLUTION VIDEO PROCESSING,” filed on Jan. 11, 2008.

BACKGROUND OF THE INVENTION

The present invention relates to digital video processing. It is applicable, in particular, in the field of super-resolution video processing. Super-resolution video processing methods are used in various applications including super-resolution interpolation (such as frame-rate conversion, super-resolution video scaling and deinterlacing) and reduction of compression artifacts and/or noise.

In digital systems, a video sequence is typically represented as an array of pixel values I_(t)(x) where t is an integer time index, and x is a 2-dimensional integer index (x₁, x₂) representing the position of a pixel in the image. The pixel values can for example be single numbers (e.g. gray scale values), or triplets representing color coordinates in a color space (such as RGB, YUV, YCbCr, etc.).

Super-resolution video processing methods consist in computing new pixel values (for interpolation) or new values of existing pixels (for noise reduction) by combining pixel values of several adjacent video frames in time.

WO 2007/115583 A1 discloses a super-resolution video processing method which exhibits very few artifacts. The method consists in selecting for each new pixel to be calculated an interpolator best suited for computing that pixel. For certain particular sequences, however, it may be necessary to enhance the method by increasing the total number of interpolators considered. The quality is increased but at the cost of a higher complexity.

In video interpolation applications, known techniques are motion adaptive or motion compensated.

Motion-adaptive video deinterlacing only provides full resolution deinterlaced frames when the video is not moving. Otherwise, the deinterlaced frames exhibit jagged contours or lower resolution textures, and flicker. An example of an advanced motion adaptive technique is described in U.S. Pat. No. 5,428,398.

Motion-compensated techniques are known to reach better quality levels, at the expense of being less robust and displaying in some cases substantially worse artifacts than motion-adaptive techniques. This happens in particular at locations of the video where motion estimation does not work well, like occlusions, transparent objects, or shadows. An example of a motion-compensated deinterlacing technique is described in U.S. Pat. No. 6,940,557.

A standard way to perform frame-rate conversion includes estimating motion estimation between two frames to compute a dense motion field, and computing new frames with motion-compensated interpolation. For the same reasons as above, frame-rate conversion based on such steps has a number of drawbacks. Dense motion estimation fails on periodic patterns, on contours or on flat areas.

A popular technique for motion estimation is referred to as “block matching”. In the block matching technique, estimating the motion at x and t consists in minimizing a matching energy E_(x)(v) over a window W which is a set of offsets d=(d₁, d₂). A possible form of the matching energy (L₁-energy) is

${E_{x}(v)} = {\sum\limits_{d \in W}\; {{{{l_{t}\left( {x + d} \right)} - {l_{t + 1}\left( {x + d + v} \right)}}}.}}$

Another form frequently used is the L₂-energy or Euclidean distance:

${E_{x}(v)} = {\sum\limits_{d \in W}\; {{{{l_{t}\left( {x + d} \right)} - {l_{t + 1}\left( {x + d + v} \right)}}}^{2}.}}$

Block matching is well suited for motion compensation in video compression schemes such as MPEG, which make use of block-based transforms. If the matching algorithm matches two windows of images that are similar, but do not represent the same object (e.g. matching the first ‘e’ with the second ‘e’ in an image of the word “sweet”), compression efficiency is not impaired. However, when doing video interpolation, matching groups of pixels which do not actually correspond to the same object leads to interpolation artifacts, because the interpolated pixels will reflect an “incorrect motion” due to spatial correlation in the objects appearing in the images.

Block matching methods are computationally intensive, in proportion to the number of possible displacements that are actually considered for each pixel. In video compression again, “fast” block matching strategies consist in limiting the range of possible displacements using predetermined motion subsets. This is not acceptable in video interpolation where using a displacement vector that is too inaccurate leads to blurry interpolated images or to artifacts.

To circumvent these problems in motion estimation, several methods have been developed. A first set of methods impose a smoothness constraint on the motion field, i.e. by imposing that for pixels that close one to another, the corresponding motion vectors are close. This can be achieved with multiscale motion estimation, or recursive block matching. Another type of method designed to solve this issue is phase correlation.

U.S. Pat. No. 5,742,710 discloses an approach based on multiscale block-matching. In the 2-scale case, block matching is performed between copies of I_(t) and I_(t+1) that have been reduced in size by a factor of 2 in each dimension (i.e. four times less pixels) and the resulting displacement map is then refined to obtain a resolution twice finer. The refinement process is a search of limited range around the coarse scale results. As a result, the cost of the displacement search is reduced because full range searches are done only on smaller images. The resulting displacement field is also smoother because it is a refinement of a low resolution map. However, the motion in a scene cannot be accurately accounted for by a smooth displacement map: the motion field is inherently discontinuous, in particular around object occlusions. Enforcing a displacement map smoothness constraint is not an appropriate way to address the robustness issue.

Another method to handle in a similar way this problem is recursive block matching as disclosed in “True-Motion with 3D Recursive Search Block Matching”, G. De Haan et al., IEEE Transactions on Circuits and Systems for Video Technology, Vol. 3, No. 5, October 1993, pp. 368-379. This method significantly reduces the cost of computing a motion map, but it can still be misled by periodic patterns or even occlusions.

GB-A-2 188 510 discloses a so-called phase correlation method in which a displacement energy map is computed over a large image window for a set of candidate displacements. This map can be computed efficiently using fast Fourier transform. A subset of displacements corresponding to peak values in the energy map is determined as including the most representative displacements over this window. Then block matching is performed as a second step pixelwise considering only this subset of displacements.

This method reduces the complexity of motion estimation, and is also able to detect discontinuous motion maps. With the phase correlation technique, the motion map is also regularized and constrained, but in a way very different from spatial regularization. Instead of imposing a local smoothness of the motion map, phase correlation limits to a fixed number the set of different possible vectors in a motion map.

However, phase correlation still requires relatively complex computations based on 2-dimensional fast Fourier transforms that are expensive to implement in hardware. Also, the method selects motion vectors on the basis of individual merit that is assessed with their phase correlation. So it has a limited ability to provide a minimal set of motion vectors. Indeed, when a moving pattern has a periodic structure or is translation-invariant, several vectors have comparable merit values, and phase correlation is not able to arbitrate between them. The resulting motion-compensated video interpolation process is thus of suboptimal robustness. This has also a cost in terms of complexity because for all pixels, more candidate motion vectors are considered than necessary.

Other classes of approaches include selecting a first subset of displacements by computing low-complexity matching energies on candidate vectors. This can reduce the computational complexity to some extent, but it is not an appropriate way to make the motion-compensated interpolation more reliable.

Classical and still popular methods for noise reduction in video sequences include motion-compensated recursive or non-recursive temporal filtering. See, e.g., “Noise reduction in Image Sequences Using Motion-Compensated Temporal Filtering”, E. Dubois and S. Sabri, IEEE Transactions on Communications, Vol. COM-32, No. 7, July 1984, pp. 826-832. This consists in estimating motion between a frame and a preceding frame, and filtering the video sequence along the estimated motion with a temporal filter.

Other known methods use motion-compensated 3D wavelet transforms. See “Three-Dimensional Embedded Subband Coding with Optimized Truncation (3D-ESCOT)”, Xu, et al., Applied and Computational Harmonic Analysis, Vol. 10, 2001, pp. 290-315. The motion-compensated 3D wavelet transform described in this paper can be used for noise reduction, by performing a wavelet thresholding on this 3D transform. The limitation of such an approach using lifting-based wavelet transform along motion threads is its very high sensitivity to the corruption of the motion map by noise.

WO 2007/059795 A1 describes a super-resolution processing method that can be used for long-range noise reduction or super-resolution scaling. The method is based on a bandlet transform using multiscale grouping of wavelet coefficients. This representation is much more appropriate for noise reduction or super-resolution scaling than the 3D transform described in the 3D-ESCOT paper. The multiscale grouping performs a variable range image registration that can be computed for example with block matching or any state of the art image registration process. For both super-resolution scaling and noise reduction, it is important that the image registration map used is not corrupted by noise or by aliasing artifacts.

Whatever the application (interpolation or noise reduction), using a motion-compensated approach with a dense flow field has limitations: aperture, irrelevance of a single motion model for contents with transparent objects or shadows. Analyzing the local invariance structure of video by detecting at each pixel one or more directions of regularity of the video signal in space and time, as described in WO 2007/115583 A1 provides a more general and robust way to do video interpolation. There is thus a need for a technique which makes it possible to detect such directions in an efficient way and with enhanced robustness.

An object of the present invention is to propose a method useful for detecting directions of regularity in an input video stream with high accuracy and high robustness. In particular, in super-resolution video interpolation, it is desired to avoid artifacts usually caused by incoherent interpolation directions. In video noise reduction, it is desired to select averaging directions that are not corrupted by noise.

Another object is to reduce substantially the implementation complexity of the super-resolution interpolation or noise reduction processing.

SUMMARY OF THE INVENTION

A method of analyzing an input video sequence is disclosed in which pixels of synthesized images of an output video sequence are associated with respective directions of regularity belonging to a predefined set of directions. The method comprises: determining, from the predefined set of directions, a first subset of candidate directions for a region of a first image of the output sequence; determining, from the predefined set of directions, a second subset of candidate directions for a corresponding region of a second synthesized image of the sequence following the first image, based on images of the input sequence and the first subset of candidate directions; and detecting the directions of regularity for pixels of said region of the second synthesized image from the second subset of candidate directions.

The subset of candidate directions is determined in a time recursion by taking into account the subset determined at a preceding time. Typically, directions will be added to or removed from the subset depending on incremental changes of a cost function caused by such addition or removal. The image “regions” can encompass the whole image area, or only part of it, as discussed further below.

The determination of the second subset of candidate directions may comprise: detecting at least one pair of directions v_(r) and v_(a) such that v_(r) belongs to the first subset of candidate directions, v_(a) belongs to the predefined set of directions but not to the first subset, and a cost function associated with the first subset with respect to the first and second images is higher than the cost function associated with a modified subset including v_(a) and the directions of the first subset except v_(r); and in response to the detection, excluding v_(r) from the second subset and including v_(a) into the second subset.

The technique can use simple operations and structures to accelerate the detection of the directions of regularity, or reduce its implementation cost. It reduces the number of artifacts occurring in motion-compensated video interpolation.

A feature of some embodiments consists in evaluating the relative marginal gain that a new direction provides to an existing subset of directions. In contrast, most existing methods in the specific field of motion estimation only use an absolute efficiency measure of a displacement vector, without taking into account which displacements are already used. The present approach selects sparser direction sets, and also manages to put aside various artifacts.

For example, the known phase correlation method consists in finding inside a region of the image the best displacements according to a global phase correlation measure. Within a certain image region, all candidate displacements V_(i) have an associated phase correlation value which can be noted P(V_(i)), for i=1, . . . , n. An optimal subset will then consist of displacements with the highest phase correlation values. This can be compared to selecting the subset of m directions (V_(i))_(iεS) such that

$\sum\limits_{i \in S}\; {P\left( V_{i} \right)}$

is maximal. The functional

$\sum\limits_{i \in S}\; {P\left( V_{i} \right)}$

on the directions subset is separable, i.e. it can be written as a sum of functionals applied to each direction individually. This choice is commonly made because this is the only case where directly minimizing the functional does not lead to a combinatorial explosion. To find the optimal subset S from the point of view of phase correlation, the m directions for which the functional P takes the highest value are simply picked in that order.

If, however, the functional is not separable and can only be written as P({V_(i)}_(iεS)), the minimization cannot be done using such a simple algorithm. Finding the best subset of candidates directly is of high combinatorial complexity. In some cases, however, what can still be done is computing variations of the functional when a vector or direction is added to or removed from the selected subset, i.e. P({V_(i)}_(iεS))−P({V_(i)}_(iεS′)) where S and S′ only differ by one element. This then opens the way to incremental optimization of the functional in a time-recursive way.

Hence, in certain embodiments, the determination of the second subset of candidate directions includes: evaluating first margins relating to respective contributions of the individual directions of the first subset to a cost function associated with the first subset; evaluating second margins relating to respective decrements of the cost function resulting from the addition of individual directions of the predefined set to the first subset; and substituting a direction of the predefined set for a direction of the first subset when the second margin evaluated for said direction of the predefined set exceeds the first margin evaluated for said direction of the first subset. It is noted that a global cost function is minimized, whereas techniques such as phase correlation maximize a global correlation measure.

The super-resolution processing of the video sequence may be interpolation or noise reduction. Simple noise reduction is also possible.

The input video sequence I_(t)(x) is defined on a grid of points (x, t) called “original pixels”. The output video sequence Î_(τ)(ξ) is defined on a grid of points (ξ, τ) called “target pixels”. A pixel is defined by a position (x, t) or (ξ, τ) and the value I_(t)(x) or Î_(τ)(ξ) of the video image at that location, called a “pixel value”.

In the particular case of video interpolation, some target pixels Î_(τ)(ξ) spread over space and/or time may also be original pixels I_(t)(x) (τ=t, ξ=x) and do not need to be recomputed since we can take Î_(τ)(ξ)=I_(t)(x). The pixels for which a value has to be computed are the target pixels Î_(τ)(ξ) that are not original pixels I_(t)(x), which are coined “new pixels” (τ≠t or τ≠x).

In the case of video deinterlacing, the frame rate is usually the same in the input and output video sequences, so that the time indexes t in the output sequence can be the same as those t in the input sequence; they will generally be denoted by integer indexes t, t+1, etc. The video deinterlacing process consists in adding interpolated missing lines into the successive frames of the input sequence. Typically, the odd frames of the input sequence only have odd lines while the even frames only have even lines, i.e. for x=(x₁, x₂), the input video sequence provides I_(t)(x) only if t and x₂ are both odd or both even. The synthesized frames Î_(t) of the output deinterlaced video sequence are made of pixels Î_(τ)(ξ) with ξ=(x₁, x₂) and without any parity constraint on the integer lines indexes x₂, such that Î_(t)(ξ)=I_(t)(ξ) if t and x₂ are both odd or both even. The object of video deinterlacing is to interpolate the “best” values for Î_(t)(ξ)=Î_(t)(x₁, x₂) where one of t and x₂ is odd and the other one is even. In order to perform such interpolation, it is useful to detect inter-frame and/or intra-frame directions of regularity.

In the case of frame rate conversion, the time indexes t, t are not the same in the input and output video sequences. Integers t, t+1, etc., can be used to index the frames of the input sequence, and then some frames Î_(τ) are synthesized for non-integer values of τ. The spatial indexes ξ=x=(x₁, x₂) are often the same in the input and output frames I_(t), Î_(τ). The frame rate-converted output sequence includes synthesized frames Î_(τ) for non-integer values of τ. Again, in order to synthesize those intervening frames Î_(τ), an interpolation is performed for which it is useful to detect directions of regularity by analyzing the input video sequence. In order to detect the directions of regularity for the pixels of a synthesized output frame Î_(τ), the analysis will involve at least the frames I_(t) and I_(t+1) of the input sequence located immediately before and immediately after the non-integer time index τ, i.e. t is the integer such that t<τ<t+1.

In the case of video noise reduction, all target pixel values have to be recomputed. According to these conventions, combined super-resolution video scaling and noise reduction are a case of super-resolution noise reduction. For simple noise reduction, the target pixel grid (ξ, τ) is the same as that (x, t) of the original pixels: Î_(t)(x)=I_(t)(x)−ν_(t)(x), where ν_(t)(x) is a noise component estimate cancelled by the process. For combined super-resolution noise reduction and scaling, the target pixels are defined on a grid (ξ, τ) different from the original pixel grid (x, t). This grid (ξ, τ) is usually a finer grid that can be defined as a superset of the original pixel grid (x, t).

Another aspect of the invention relates to a computer program product, comprising instructions to carry out a video analysis method as outlined above when said program product is run in a computer processing unit.

Still another aspect of the invention relates to a video processing method, comprising: receiving successive images of an input video sequence; analyzing the input video sequence by applying a method as outlined above; and generating the output video sequence using the detected directions of regularity.

The step of generating the video sequence may comprise performing interpolation between successive images of the input video sequence using the detected directions of regularity. Such interpolation may consist of video deinterlacing or of converting the frame rate of the input video sequence. In another embodiment, the processing of the video sequence may comprise applying a noise reduction operation to the input video sequence using the detected directions of regularity.

Still another aspect of the invention relates to a video processing apparatus, comprising computing circuitry arranged to analyze or process a video sequence as indicated hereabove.

BRIEF DESCRIPTION THE DRAWINGS

FIG. 1 is a block diagram of an embodiment of a video processing device.

FIG. 2 is a block diagram of an example of direction selection unit usable in the device of FIG. 1.

FIG. 3 is a flow chart of an exemplary procedure of evaluating cost function margins in a device as illustrated in FIGS. 1 and 2.

FIG. 4 is a flow chart of an alternative embodiment of a loop used in the procedure of FIG. 3.

FIGS. 5 and 6 are flow charts of exemplary procedures of arbitrating between candidate directions in a device as illustrated in FIGS. 1 and 2.

FIGS. 7 and 8 are diagrams illustrating frame rate conversion processing applied to certain image portions.

FIGS. 9, 10, 11 and 12 are diagrams illustrating video deinterlacing processing applied to similar image portions.

FIG. 13 is a block diagram of a video processing apparatus according to an embodiment of the invention.

DESCRIPTION OF PREFERRED EMBODIMENTS

Referring to FIG. 1, a video processing device has an input receiving digital representations of successive images or frames of a video sequence. I_(t), I_(t+1) denote frames at discrete times t and t+1, and I_(t)(x), I_(t+1) (x) denote pixel values of those frames for a pixel located by a 2-dimensional index x=(x₁, x₂). How the time indexes t and spatial indexes x are managed may differ from one video processing application to another, e.g. between deinterlacing, frame rate conversion and noise reduction. This issue will be addressed further below.

A direction selection unit 101 implements a time recursive estimation to determine a subset D_(τ′) of candidate directions for an output frame Î_(τ′) based on a previous subset D_(τ) and on the consecutive input frames. The aforesaid “previous subset D_(τ)” was determined for an output frame Î_(τ) which immediately precedes Î_(τ′) in the output video sequence. For example τ′=τ+1 for deinterlacing or simple noise reduction; τ′=τ+δτ for frame rate conversion or super-resolution noise reduction. The input frames involved in the determination of the subset D_(τ′) at time τ′ include at least I_(t) and I_(t+1) such that t≦τ′<t+1. In certain embodiments, they may further include a few past frames I_(t−1), . . . , I_(t−n) (n≧1).

As referred to herein, a “direction” v=(dx, dt) is meant as a direction in the 3D space in which two dimensions relate to pixel offsets dx=(dx₁, dx₂) in the 2D image space and the third direction relates to a time offset dt. There are a number of video applications in which it is desired to look for directions of regularity in an incoming video sequence. When doing video interpolation for example, one must determine the values of certain missing pixels based on “similar” pixels in a neighborhood of the missing pixels. Such a neighborhood can extend in the 2D image space and/or in time, so that it is relevant to look for it in the above-mentioned 3D space. Likewise, in noise reduction applications, the value of an input pixel is corrupted by noise which can be averaged out if it is possible to identify some neighborhood of “similar” pixels. Again, such a neighborhood can extend in the 2D image space and/or in time. The method described below yields directions of regularity for pixels of the images which help determining the “similar” pixel values useful to the processing.

The subset D_(τ) or D_(τ′) is said to define a sparse geometry. Each subset D_(τ) or D_(τ′) is a subset of a set Ω containing all the possible directions of regularity. The geometry defined by D_(τ), D_(τ′) is said to be sparse because for each instant τ, τ′, the number of different directions that can be used is limited to a relatively small number. As described further below, the subset of candidate directions D_(τ), D_(τ′), . . . evolves in time with marginal changes. Directions that would be redundant in D_(τ), D_(τ′) are removed and not used for the pixel-by-pixel processing.

Typically, Ω can contain 200 to 1000 different directions (200≦|Ω|≦1000, bars being used to denote the size of a set). The subsets D_(τ), D_(τ′), . . . can have their sizes limited in the range 10≦|D_(τ)|≦50.

A direction detection unit 102 then determines a distribution of directions of regularity {v} based on the consecutive frames I_(t), I_(t+1) (and possibly a few past frames I_(t−1), . . . , I_(t−n)) by testing only candidate directions belonging to the subset D_(τ′) determined by the selection unit 101. The reduction in size from Ω to D_(τ′) makes it possible to carry out the detection without requiring an exceedingly high complexity.

Finally, the video processing unit 103 uses the detected directions of regularity {v} to perform a video processing, such as deinterlacing, frame rate conversion or noise reduction to deliver output video frames from the input frames I_(t), I_(t+1).

Units 102 and 103 can implement any conventional or state-of-the-art methods, and simple examples will be given for completeness. In particular, the detection unit 102 can use the loss function described in WO 2007/115583 A1. The core of the invention lies in unit 101 that will be described in greater detail.

As the direction selection unit 101 considers a much larger set of directions than the direction detection unit 102, an interesting possibility is to use a simpler or cost function in unit 101 than in unit 102. In other words, the local cost functions are estimated more coarsely in the step of determining the direction subset D_(τ′) (selection unit 101) than in the step of picking the directions from that subset (direction detection unit 102). This provides substantial savings in terms of computational complexity or, equivalently, in terms of ASIC/FPGA logic size.

This can be done, for example, by using less precise representations of pixel values, e.g. 5- or 6-bit pixel values in unit 101 instead of 8- to 10-bit pixel values in unit 102. Another possibility is to use in the direction selection unit 101 convolution windows g (to be described further below) that are simpler to compute than those used in the direction detection unit 102, e.g. window profiles corresponding to simple infinite impulse response (IIR) filters which do not require so much logic and memory as large explicit finite impulse response (FIR) filters. Also, cost functions (described below) of different computational complexities can be used for the subset selection in unit 101, and for the pixelwise direction detection in unit 102.

The aim of the selection unit 101 is to compute a subset of directions D_(τ′) providing a useful description of the local regularity of the video sequence at an instant τ′ in the output sequence. The best subset D is the one that minimizes a global cost (or loss) function L(D):

$\begin{matrix} {{L(D)} = {\sum\limits_{x}{\min\limits_{v \in D}\left\lbrack {L_{x}(v)} \right\rbrack}}} & (1) \end{matrix}$

where the sum over the pixels (x) spans the whole image area (or part of it). The quantity L_(x)(v) to be minimized over the candidate directions v of D is a local cost (or loss) function, which can be of various kinds for v=(dx, dt), such as:

Absolute difference: L _(x)(v)=|I _(t)(x)−I _(t+dt)(x+dx)|

Quadratic difference: L _(x)(v)=|I _(t)(x)−I _(t+dt)(x+dx)|²

Weighted sum of absolute differences:

${L_{x}(v)} = {\sum\limits_{d}{{g(d)}.{{{I_{t}\left( {x + d} \right)} - {I_{t + {dt}}\left( {x + d + {dx}} \right)}}}}}$

Weighted sum of quadratic differences:

${L_{x}(v)} = {\sum\limits_{d}{{g(d)}.{{{I_{t}\left( {x + d} \right)} - {I_{t + {dt}}\left( {x + d + {dx}} \right)}}}^{2}}}$

where g is a convolution window function, i.e. with non-zero values in a vicinity of (0,0).

Other variants are possible, including computing local cost functions over more than two frames of the video sequence, e.g. L_(x)(v)=|I_(t)(x)−I_(t+dt)(x+dx)|+|I_(t)(x)−I_(t−dt)(x−dx)|, and similar variations.

For convenience, we also define the local cost L_(x)(D) of a set of directions as the minimum of the loss function over all directions in that set:

$\begin{matrix} {{L_{x}(D)} = {\min\limits_{v \in D}\left\lbrack {L_{x}(v)} \right\rbrack}} & (2) \end{matrix}$

Note that finding the subset D minimizing (1) is of extreme combinatorial complexity, because the value of adding a direction to the subset D depends on the directions already present in that subset. To overcome this difficulty, an incremental approach is proposed. The minimization is done using time recursion, by applying only marginal changes to D_(τ), D_(τ′), . . . in time.

The direction selection unit 101 as depicted in FIG. 2 has a block 201 for evaluating margins m(v) for different directions v of the set of possible directions Ω, and an arbitration block 202 to decide, based on the margins m(v) which directions of D_(τ) should be excluded from D_(τ′) and which directions of Ω−D_(τ) should be included into D_(τ′). The directions v selected to be added to D_(τ) to get D_(τ′) are chosen depending on how much they would marginally contribute to improving (reducing) the cost function L(D) according to (1). Likewise, the directions v to be removed from D_(τ) are chosen depending on how little they marginally contribute to reducing that cost function L(D).

Deciding which elements are in D_(τ′) cannot be done by evaluating L(D) for the various combinations D which may form D_(τ′). However, how L(D) varies when a new direction v of Ω−D is added to D can be estimated using the margin, noted m(v|D), of a direction v with respect to an existing direction subset D:

m(v|D)=L(D)−L(D+{v})  (3)

where D+(v) denotes the union of the set D and of the singleton {v}. In other words, m(v|D) is the measure of how much a new direction marginally contributes to lowering the cost function (1) already obtained with a subset of directions D. The margins m(v|D) can be computed using:

$\begin{matrix} {{m\left( v \middle| D \right)} = {\sum\limits_{x}{m_{x}\left( v \middle| D \right)}}} & (4) \end{matrix}$

where the local margin m_(x)(v|D) at location x of v with respect to D is:

-   -   m_(x)(v|D)=0 if L_(x)(v)≧L_(x)(D), i.e. when v is not better at         minimizing the cost function at pixel position x than the         directions already in D;     -   m_(x)(v|D)=L_(x)(D)−L_(x)(v) else.

Computing a margin m_(x)(v|D) for a fixed D and for each x and each candidate v in Ω−D can be done by determining the quantities L_(x)(D) and L_(x)(v). Then m(v|D) is computed by updating running sums of m_(x)(v|D).

Let us consider the case of including a new direction v_(a), and removing an already selected direction v_(r) from D_(τ) to compute D_(τ′) as

D _(τ′) =D _(τ) −{v _(r) }+{v _(a)}.

The decrease of the global cost (1) caused by such an exchange can be written as an exchange margin M_(exch)(v_(a), v_(r)):

$\begin{matrix} \begin{matrix} {{M_{exch}\left( {v_{a},v_{r}} \right)} = {{L\left( D_{\tau} \right)} - {L\left( {D_{\tau} - \left\{ v_{r} \right\} + \left\{ v_{a} \right\}} \right)}}} \\ {= {{m\left( v_{a} \middle| D_{\tau} \right)} - {m\left( v_{r} \middle| {D_{\tau} - \left\{ v_{r} \right\} + \left\{ v_{a} \right\}} \right)}}} \end{matrix} & (5) \end{matrix}$

If M_(exch)(v_(a), v_(r))>0, namely m(v_(a)|D_(τ))>m(v_(r)|D_(τ)−{v_(r)}+{v_(a)}), substituting direction v_(a) for direction v_(r) in D_(τ) reduces the global cost so that it is worth swapping v_(r) and v_(a). Computing these various margins is tractable, but it is still possible to significantly reduce the amount of computation. This can be understood as follows: “if v_(a) provides a larger marginal decrease of the global cost than v_(r) was providing, it is reasonable to do the exchange”. In such an approach, instead of computing the exact margins m(v_(r)|D_(τ)−{v_(r)}+{v_(a)}) in (5), some approximations can be made.

In a first approximation, m(v_(r)|D_(τ)−{v_(r)}+{v_(a)}) is replaced by m(v_(r)|D_(τ)−{v_(r)}). The following inequality is always verified:

m(v _(r) |D _(τ) −{v _(r)})≧m(v _(r) |D _(τ) −{v _(r) }+{v _(a)})  (6)

The complexity gain provided by this approximation is significant. The number of margins to be computed is now of order |D| instead of |Ω−D|×|D|. Using this approximation, we can derive a exchange margin M′_(exch)(V_(a), v_(r)) as follows:

M′ _(exch)(v _(a) ,v _(r))=m(v _(a) ↑D _(τ))−m(v _(r) |D _(τ) −{v _(r)})  (7)

Note that the exchange margin M′_(exch)(v_(a), v_(r)) in (7) is not more than the actual exchange margin M_(exch)(V_(a), v_(r)) in (5). If the approximated exchange margin M′_(exch)(v_(a), v_(r)) is non-negative, the actual exchange margin M_(exch)(v_(a), v_(r)) is also non-negative. So a swap decided based on (7) cannot be a wrong one from the point of view of (5).

FIG. 3 is a flow chart illustrating a procedure usable by block 201 to evaluate the margins m(v_(a)|D_(τ)) and m(v_(r)|D_(τ)−{v_(r)}) used in (7). In FIG. 3, it is assumed that one subset D_(τ′) of candidate directions is determined for each new input frame I_(t+1) received by the direction selection unit 101. This assumption is valid for video deinterlacing or simple noise reduction (e.g. τ=t, τ′=t+1), or for frame rate doubling (τ=t−½, τ′=t+½). Generalization to frame rate conversion with a ratio other than 2 is straightforward (a procedure of the kind shown in FIG. 3 is generally run for each new output frame to be generated; the above assumption just makes the explanation clearer because it means that the rate of the new output frames is the same as that of the input frames). With this assumption, we can drop the time indexes t−1, t and τ, τ′ due to the time recursion in the procedure. In addition, m(v) stands for m(v_(r)|D_(τ)−{v_(r)}) if the direction v (=v_(r)) is in D (=D_(τ)) and may be removed, and for m(v_(a)|D_(τ)) if the direction v (=v_(a)) is in Ω−D and may be added to D. The margins m(v) are evaluated for all directions v in Ω by updating running sums that are set to zero at the initialization 301 of the procedure.

The procedure scans the pixels x of the frame arrays I_(t) and I_(t+1) one by one, a first pixel x being selected in step 302. A first loop 310 over the directions v of D is executed in order to update the running sums for the directions of D (=D_(τ)) regarding pixel x. This first loop is initialized in step 311 by taking a first direction v in D and setting a variable A to an arbitrarily large value (for example its maximum possible value). At the end of loop 310, variable A will contain the value of L_(x)(D) defined in (2).

In each iteration of loop 310 (step 312), the local cost L_(x)(v) for pixel x and direction v is obtained and loaded into variable L. In step 312, block 201 can either compute L_(x)(v), for example according to one of the above-mentioned possibilities, or retrieve it from a memory if the costs L_(x)(v) were computed beforehand. A test 313 is performed to evaluate whether L is smaller than A. If L<A, the direction index v is stored in a variable u and a variable B receives the value A in step 314. Then the value L is allocated to the variable A in step 315. At the end of loop 310, variable u will contain the index of the direction v of D which minimizes L_(x)(v), i.e.

${u = {\underset{v \in D}{argmin}\left\lbrack {L_{x}(v)} \right\rbrack}},$

and variable B will contain the second smallest value of L_(x)(v) for the directions v of D, i.e.

$B = {\min\limits_{v \in {D - {\{ u\}}}}{\left\lbrack {L_{x}(v)} \right\rbrack.}}$

If L≧A in test 313, the local cost is compared to B in test 316. If A≦L<B (yes in test 316), the variable B is updated with the value L in step 317. If L≧B in test 316, or after step 315 or 317, the end-of-loop test 318 is performed to check if all the directions v of D have been scanned. If not, another direction v of D is selected in step 319 and the procedure returns to step 312 for another iteration of loop 310.

When loop 310 is over, the margin m(u) of the direction u of D_(τ) which minimizes the local cost at pixel x is updated by adding thereto the quantity B−A (step 321). As far as pixel x is concerned, removing u from D would degrade the cost by that quantity while the margins for the other directions of D would remain unaffected.

The processing for pixel x is then continued by a second loop 330 over the possible directions v that are not in D, in order to update the running sums for the directions of Ω−D regarding pixel x.

This second loop is initialized in step 331 by taking a first direction v in Ω−D. In each iteration (step 332), the local cost L_(x)(v) for pixel x and direction v is computed or retrieved to be loaded into variable L. A test 333 is then performed to evaluate whether L is smaller than A=L_(x)(D). If L<A, the margin m(v) for direction v is updated by adding thereto the quantity A−L (step 334) in order to take into account the improvement of the cost function that would result from the addition of v into D regarding pixel x. If L≧A in test 333, or after step 334, the end-of-loop test 335 is performed to check if all the directions v of Ω−D have been scanned. If not, another direction v of Ω−D is selected in step 336 and the procedure returns to step 332 for another iteration of loop 330.

When loop 330 is over, it is determined in test 341 if all pixels x of the relevant frame array have been scanned. If not, another pixel x of the array is selected in step 342 and the procedure returns to step 311. The operation of block 201 regarding the current frame is over when test 341 shows that all the pixels have been processed.

For each new input frame I_(t+1), block 201 thus outputs the margins m(v) for all directions v of Ω, i.e. removal margins for the directions of D and addition margins for the directions of Ω−D.

To initialize the procedure at the beginning of an input video sequence, the subset D can have an arbitrary content, or it can be determined with a coarse method over the first few frames. A correct subset will quickly be built due to the time recursion of the selection procedure.

A second approximation can be made to further reduce the complexity of block 201. In this approximation, m(v_(a)|D_(τ)) is replaced by a modified margin m*(v_(a)|D_(τ)). As in (4), a modified margin m*(v|D) is a pixelwise sum:

$\begin{matrix} {{m^{*}\left( v \middle| D \right)} = {\sum\limits_{x}{m_{x}^{*}\left( v \middle| D \right)}}} & (8) \end{matrix}$

of local modified margins m*_(x)(v|D) defined as:

-   -   m*_(x)(v|D)=L_(x)(D)−L_(x)(v) if L_(x)(v)<L_(x)(Ω−{v}), i.e.         when v is the best direction in Ω from the point of view of         minimizing the cost function at pixel position x;     -   m*_(x)(v|D)=0 else.

With the first and second approximations, a modified exchange margin M*_(exch)(v_(a), v_(r)) can be derived as follows:

M* _(exch)(v _(a) ,v _(r))=m*(v _(a) |D _(τ))−m(v _(r) |D _(τ) −{v _(r)})  (9)

Again, the modified exchange margin M*_(exch)(V_(a), v_(r)) is not more than the actual exchange margin M_(exch)(v_(a), v_(r)), because of (6) and because m*_(x)(v_(a)|D)≦m_(x)(v_(a)|D). So a swap decided based on (9) cannot be a wrong one from the point of view of (5).

The modified margins m*_(x)(v_(a)|D) can be computed with less expensive computations or circuitry because, for each location x, at most one running sum corresponding to a single absolute best direction in Ω−D has to be updated, whereas with non-modified margins m_(x)(v_(a)|D), the number of such winners is in the worst case (test 333 always positive in FIG. 3) equal to |Ω−D|. In implementations using hardwired ASIC or FPGA circuits, the impact on logic size is significant. For identical reasons, the impact on the worst-case execution time in a software implementation is also important.

With the second approximation, the procedure of FIG. 3 is modified by replacing loop 330 by a modified loop 430 illustrated in FIG. 4. Loop 430 is initialized in step 431 (replacing step 331) by taking a first direction v in Ω−D and setting the value of A=L_(x)(D) for another variable A*. At the end of loop 430, variable A* will contain the minimum of L_(x)(v) for all directions v in Ω, i.e. L_(x)(Ω).

In each iteration, the local cost L_(x)(v) for pixel x and direction vεΩ−D is computed or retrieved to be loaded into variable L in step 432. A test 433 is then performed to evaluate whether L is smaller than A*. If L<A*, the above-mentioned variable u is updated to contain the direction index v, and the value L is allocated to the variable A* in step 434. If L≧A* in test 433, or after step 434, the end-of-loop test 435 is performed to check if all the directions v of Ω−D have been scanned. If not, a further direction v of Ω−D is selected in step 436 and the procedure returns to step 432 for another iteration of loop 430.

When loop 430 is over, the margin m(u) of the direction u of Ω which minimizes the local cost at pixel x is updated by adding thereto the quantity A−A* (step 441). If uεD, step 441 changes nothing. If u≠D, adding u to D would reduce the cost function by A−A* as far as pixel x is concerned, while the margins for the other directions of Ω−D would remain unaffected.

The reduction of complexity results from the fact that the updating step 441 is performed out of the loop 430. The downside of this simplification is some loss of accuracy for the less-than-optimal directions of Ω−D, but this is not such a significant problem in view of the time recursion of the procedure that will eventually reveal the directions actually relevant to the video sequence.

Various procedures can be applied by block 202 to arbitrate between the candidate directions v for which the margins m(v) were computed by block 201.

In the simple example depicted in FIG. 5, block 202 selects the direction v of the subset D=D_(τ) which has the lowest margin m(v) as computed by block 201 and which is thus the best candidate for exclusion from D_(τ′) (step 501). It also selects the direction w of Ω−D which has the highest margin m(w), i.e. the best candidate for inclusion into D_(τ′) (step 502). If m(w)>m(v) (test 503), the exchange is done in step 504: v is replaced by w in D so that D_(τ′)=D_(τ)−{v}+{w}. If m(w)≦m(v) in test 503, there is no exchange: D_(τ′)=D_(τ).

FIG. 6 illustrates another approach in which block 202 can swap more than one pair of directions. In step 601, the n directions v₁, v₂, . . . , v_(n) of the subset D=D_(τ) which have the lowest margins are selected and sorted with increasing margins: m(v₁)≦m(v₂)≦ . . . ≦m(v_(n)). The number n can be any integer between 1 and |D|. In the case n=1, the procedure of FIG. 6 is the same as that of FIG. 5. In step 602, the direction w₁, w₂, . . . , w_(n) of Ω−D which have the highest margins are also selected and sorted with decreasing margins: m(w₁)≧m(w₂)≧ . . . ≧m(w_(n)). Then it is determined how many direction pairs can be swapped. For example, after initializing a loop index i (i=1) in step 603, block 202 compares the margins m(w_(i)) and m(v_(i)) in test 604. If direction w_(i) of Ω−D is better than direction v_(i) of D, i.e. m(w_(i))>m(v_(i)), the exchange is done in step 605, w_(i) replacing v_(i) in D, and then i is compared to n in test 606. If i<n, not all the pairs have been checked and i is incremented in step 607 before checking the next pair in a new test 604. The procedure is terminated when a test 604 reveals that m(w_(i))≦m(v_(i)) for some i<n, or when i=n in test 606. If n′ direction pairs are swapped (n′≦n), the updated direction subset is D_(τ′)=D_(τ)−{v₁, . . . , v_(n′)}+{w₁, . . . , w_(n′)}.

In an embodiment, when the directions of regularity are detected by unit 102, only directions v that have a margin m(v) above a given threshold T are used. This is easily done once D_(τ′) has been determined by block 202, by ignoring in the direction detection unit 102 the directions v of D_(τ′) such that m(v)<T.

Alternatively, the inclusion of new directions w of Ω−D_(τ) into D_(τ′) can be prevented when m(w) is below the threshold T. There are various ways of doing this. For example, if the procedure of FIG. 6 is used, the number n can be set as the largest integer in {1, 2, . . . , |D|} such that m(w_(i))>T for all indexes i such that 1≦i≦n.

The use of the threshold T helps to prune the set of candidate directions and to select a number of candidate directions that is adapted to the geometric complexity of the video, i.e. to select the sparsest set of directions suitable for the video.

FIGS. 7 and 8 illustrate the results provided by an embodiment of the invention in a case where the video processing unit 103 performs interpolation and more particularly frame rate conversion with a ratio of 2 between the frame rates of the output and input video sequences.

The video sequence in this example is a horizontally scrolling caption with the text “Sweeet”. 701 and 801 denote the image at time t, 703 and 803 the image at time t+1 and 702 and 802 the synthesized image at time=τ′+½, with a mismatch in FIG. 7 and with a correct interpolation in FIG. 8. Between images 701/801 and 703/803 (times t and t+1), the whole text “Sweeet” has scrolled 10 pixels to the left. A possible cause for mismatch is that the text contains several times the letter “e” with a periodicity of 8 pixels, and the direction detection unit 102 might be mistaken by the first “e” at time t looking like another “e” in the next input image at time t+1, leading to artifacts as shown in 702.

In the example of FIGS. 7 and 8, the cost function used in unit 101 is centered, and Ω contains only directions v=(dx, dt) with dt=½. The cost for a direction v=(dx, dt) at location x and time τ′=t+½ is then, for example, L_(x)(v)=|I_(t′−dt)(x−dx)−I_(τ′+dt)(x+dx)|=|I_(t)(x−dx)−I_(t+1)(x+dx)| or preferably a windowed version of this cost, by convolution with a non-negative spatial window function g. Two directions of regularity can be found with a local measure on this sequence:

$v^{(1)} = {\left( {{dx}_{1}^{(1)},{dx}_{2}^{(1)},{dt}^{(1)}} \right) = \left( {{- 5},0,\frac{1}{2}} \right)}$ and $v^{(2)} = {\left( {{dx}_{1}^{(2)},{dx}_{2}^{(2)},{dt}^{(2)}} \right) = {\left( {{- 1},0,\frac{1}{2}} \right).}}$

Once a direction v=(dx, ½) is detected by unit 102 for a pixel x at time τ′=t+½, the interpolation for frame rate conversion done in unit 103 may consist in computing Î_(τ′)(x)=Î_(t+1/2)(x)=[I_(t)(x−dx)+I_(t+1) (x+dx)]/2.

In FIG. 7, we assume that no sparse geometry is used, so that all directions in Ω are considered in the detection unit 102. For some pixels between the first and the third “e” of the text, the detected direction may group the first “e” at time t with the second “e” at time t+1 (see the squares in FIG. 7) and the second “e” at time t with the third “e” at time t+1, leading to incorrect temporal interpolation. Reference 702 shows an incorrect image with an artifact resulting from this incorrect interpolation. A simple workaround consisting in mixing the interpolated values corresponding to both detected directions v⁽¹⁾, v⁽²⁾ does not solve the problem either.

Using a sparse geometry D_(τ′) in unit 101 helps to overcome this problem. Indeed, if the subset D_(τ) does not contain the direction

${v^{(1)} = \left( {{- 5},0,\frac{1}{2}} \right)},$

the margin of v⁽¹⁾ with respect to D_(τ′) will be high because only v⁽¹⁾ can account for the scrolling of the letters “S”, “w” and “t”. So v⁽¹⁾ will at some time τ′ enter D_(τ′). This done, since v⁽¹⁾ is a possible direction of the video over all letters including all “e”s, the margin of

$v^{(2)} = \left( {{- 1},0,\frac{1}{2}} \right)$

will become very low or even zero, because there is no region of the video where it is a possible direction of regularity and v⁽¹⁾ is not. As a result, the direction v⁽²⁾ will be kept out of the set D_(τ′) so that it will not be taken into account in the detection unit 102, or will be ignored because its margin is below a threshold T. The correct interpolation will be computed as depicted in 802.

Note that the temporal interpolation can be done at times other than halfway between two original frames. For example, in applications to conversion between the 50 Hz and 60 Hz frame rate standards, interpolation is done at times τ′=t+h/6, where h is one of 1, 2, 3, 4 or 5. The loss function used in units 101 and 102 can then be adapted accordingly.

FIGS. 9-12 are diagrams similar to FIGS. 7-8 illustrating application of an embodiment of the invention to super-resolution video deinterlacing.

FIGS. 9-10 show the same text “Sweeet” scrolling in an interlaced video format at the input of the apparatus. References 901, 1001, 1101 and 1201 show an even input field at time t−1, references 903, 1003, 1103 and 1203 show the next even input field at time t+1, and references 902 and 1002 show the intervening odd input field at time t. The purpose of deinterlacing is the compute the even lines at time t to synthesize a full progressive frame at time τ′=t containing both even and odd lines.

In the example of FIGS. 9-12, the cost function used in unit 101 is centered, and Ω may contain only directions v=(dx₁, dx₂, dt) such that dt=1 and dx₂ is even. The cost for a direction v=(dx, dt) of Ω at location ξ=x=(x₁, x₂) and time τ′=t is then, for example, L_(x)(v)=|I_(t−dt)(x−dx)−I_(t+dt)(x+dx) or a windowed version of this cost. Several directions of regularity can be found a priori on this sequence, including v⁽¹⁾=(dx₁ ⁽¹⁾,dx₂ ⁽¹⁾,dt⁽¹⁾)=(−5,0,1) and v⁽²⁾=(dx₁ ⁽²⁾,dx₂ ⁽²⁾,dt⁽²⁾)=(−1,0,1).

Once a direction v=(dx, 1) is detected by unit 102 for a pixel x at time τ′=t, the interpolation for deinterlacing done in the processing unit 103 may consist in computing Î_(τ′)(ξ)=Î_(t)(x)=[I_(t−1)(x−dx)+I_(t+1)(x+dx)]/2.

In FIG. 11, we again assume that the selection unit 101 feeds all directions of Ω to the detection unit 102 without using a sparse geometry. The detection unit 102 cannot properly discriminate between directions v⁽¹⁾=(−5,0,1) and v⁽²⁾=(−1,0,1) and the output can again display dislocation-type of artifacts as shown in 1102.

FIG. 12 illustrates the result of a better deinterlacing when only the direction v⁽¹⁾=(−5,0,1) is retained in the sparse geometry by the selection unit 101, the superfluous direction v⁽²⁾=(−1,0,1) being eliminated in the selection step of the analysis.

Alternatively, in a deinterlacing application, when computing pixels at time τ′=t, a direction can be computed between t−2 and t+2 using the value dt=2 in the directions of Ω, in order to account for directions with higher definition. This means that directions v=(dx, 1) and 2v=(2dx, 2) are used in the same way in the interpolation. Because of parity constraints of the interlaced source, corresponding loss functions |I_(t−2)(x−2dx)−I_(t+2)(x+2dx)| can be computed. If a direction 2v=(2dx, 2dt)=(2dx₁, 2dx₂, 2dt) is detected by unit 102, the vertical coordinate dx₂ of to the half-direction v can be odd. This allows deinterlacing properly video sequences including half-pixel vertical speeds. If such a direction description is referred to in the direction selection and detection units 101-102, the processing unit 103 may interpolate Î_(τ′)(ξ) as:

Î _(τ′)(ξ)=Î _(t)(x)=[I _(t−2)(x−2dx)+I _(t+2)(x+2dx)]/2

The direction measure that is used can involve time steps of either dt=1 or dt=2. This corresponds to comparing various directions as well as different temporal offsets (1 or 2, or even more).

Another possibility in deinterlacing applications is to compute costs for directions where the fields are shot at irregularly spaced times, in addition to directions associated with fields shot at evenly spaced times. This is for example the case when the original source of the video contents is film converted to video using “telecine”. For example, in 2:2 telecine used in Europe, when 25 fps (frames per second) film is transferred to 50 fps video, each film frame is used to generate two video fields, so fields I₀, I₁, I₂, I₃ are shot at respective times 0 s, 0 s, 2/50 s, 2/50 s, instead of times 0/50 s, 1/50 s, 2/50 s, 3/50 s for video-originating contents. Furthermore, a video signal can contain a mix of film-originating contents and video-originating contents, so this detection has to be made pixelwise. Specific local cost functions can be chosen for detecting whether for a given pixel, the video is film-originating and whether the field just before or just after originates from the same film frame. A configuration of the direction at each pixel is then one of the following:

(film-before)

(film-after)

(video, v)

where “film-before” means that at a given pixel location, the contents is film-originating, and the preceding field comes from the same film frame, so that missing pixels can be picked at the same location from the preceding field, where “film-after” means that at a given pixel location, the contents is film-originating, and the field after comes from same film frame, and where (video, v) means that at the current pixel location, the contents is video-originating, and the direction vector is v. This description exemplifies another case where the “direction” can be defined by a local descriptor more complex than a single 3D vector v. In this case, the “direction” is a symbol which is one of (film-before), (film-after), (video,v) where v is a vector.

In the case of super-resolution video noise reduction, the processing unit 103 of FIG. 1 computes for each target pixel ξ, τ its new value by using a directional averaging function K_(v) at ξ with:

${{\hat{I}}_{\tau}(\xi)} = {\sum\limits_{x,t}{{K_{v}\left( {{\xi - x},{\tau - t}} \right)}.{I_{t}(x)}}}$

where the sum runs over all pixels (x, t) of the input images in a vicinity of (ξ, τ), including the pixel (ξ, τ) itself if ξ=x, τ=t for some point (x, t) of the input grid, and K_(v) depends on the local direction v=(dx,dt). In an exemplary embodiment, the averaging functions K_(v) are directional averaging functions along a direction v=(dx, dt). An example is the function:

K _(v)(x,t)=K ₁(t)×K ₂(x−t·dx/dt)

where K₁ and K₂ are 1D and 2D averaging kernels, for example Gaussian.

In another embodiment, the video processing performed in the processing unit 103 receives a variable number of directions from the direction detection unit 102. Each of these directions can be accompanied with a relevance measure. In the case where the number of directions is 0, a fallback interpolation function or averaging function can be used. In the case where the number of directions is larger than 1, the target pixel value can be computed by combining the pixel values computed with each interpolating or averaging function corresponding to each direction. This combination can be an averaging, a weighted averaging using the relevance measure, or a median, or a weighted median, or any other kind of method to combine these pixel values.

In another exemplary embodiment, the noise reduction processing along direction v=(dx, dt) can be any kind of known directional filtering, including infinite impulse response (IIR) filtering.

In another exemplary embodiment, the sparse geometry is used to enhance the type of processing disclosed in WO 2007/059795 A1 when the processed signal is a video signal. The directions (dx, dt) may then be limited to values of dt=1 and to integer values of dx. They can be used to construct a mapping between pixels of a frame at time t and pixels of a frame t+1: (x,t)

(x+dx,t+1), and provide an embodiment for the first grouping estimation used in WO 2007/059795 A1.

In an embodiment of the direction selection unit 101, the set Ω of candidate directions is partitioned into a plurality of subsets Ω₁, . . . , Ω_(J) (J>1), and only one of the subsets Ω₁, . . . , Ω_(J) is considered by the direction selection unit 101 at each time τ′ to provide candidates to enter the subset of selected directions D_(τ′). This is interesting when the set Ω is too large to be entirely scanned for candidates in every cycle τ′. For example, at a time when subset Ω_(j) is considered (1≦j≦J), loop 330 in FIG. 3 or 430 in FIG. 4 is carried out for the directions v that are in Ω₁ but not in D.

In certain cases, it may be interesting, in addition to the selection of a global subset D_(τ′) for the whole image area, to split the image support into several windows W_(p,q) of pixels, for example defined as rectangular regions:

W _(p,q)={(x ₁ ,x ₂):w×(p−1)<x ₁ ≦w×p and h×(q−1)<x ₂ ≦h×q}

where h and w are respectively the height and the width (in pixels) of these windows, and the window indexes p, q are in the ranges 1≦p≦P, 1≦q≦Q. The total number of windows is P×Q. When P=Q=1, there is only one window consisting of the whole image area as described previously. For each direction v inside each window W_(p,q), a margin m_(p,q)(v|D) can be computed using a formula similar to (4), but with a sum spanning an image region limited to this window W_(p,q):

$\begin{matrix} {{m_{p,q}\left( v \middle| D \right)} = {\sum\limits_{x \in W_{p,q}}{m_{x}\left( v \middle| D \right)}}} & (10) \end{matrix}$

Local subset of directions D_(τ′,p,q)⊂D_(τ′) can be computed using these margins. A third subset D_(τ′,p,q) of candidate directions is thus determined as a subset of the second subset D_(τ′) determined for the whole area of I_(t+1), based on cost margins m_(p,q)(v|D) computed for pixels of the window W_(p,q) in the input images I_(t) and I_(t+1). When the direction detection unit 102 measures a direction at a pixel ξ=x which is inside one of the windows W_(p,q), only candidate directions from D_(τ′,p,q) are taken into account. This is helpful to increase the robustness of the detection to avoid bad directions. Referring again to the example depicted in FIGS. 7-12, the selection allows to eliminate a bad direction (−1, 0, ½) [or (−2, 0, 1)] and to only use the right direction (−5, 0, ½) [or (−10, 0, 1)]. If the scene is more complex and somewhere else in the picture an object happens to be exhibiting a direction of regularity (−1, 0, ½), this vector (−1, 0, ½) will be present in D_(τ′), and the benefit of the selection made in unit 101 may be lost to properly handle the scrolling text. If the selection margins are recomputed on smaller windows W_(p,q), the probability that such a window W_(p,q) includes both the scrolling text and the object having the single direction of regularity (−1, 0, ½) will be much lower.

When using too small windows W_(p,q) (e.g., in the case of FIGS. 7-12 a region spanning only one or two “e”s), the selection may become difficult because on too small windows, it is not possible any more to discriminate between two different directions of regularity. A multiscale selection scheme can be devised to avoid this difficulty, by recursively splitting the image support into windows, and each window into sub-windows. For each window, the subset of directions is selected as a subset of the subset of directions that was selected for the parent region (whole image or higher-layer window). In the multiscale selection scheme, one or more of the windows W_(p,q) is further split into a plurality of sub-windows W_(p,q,r,s), and for each sub-window a fourth subset D_(τ′,p,q,r,s) of candidate directions is determined as a subset of the third subset D_(τ′,p,q) determined for the window W_(p,q), based on cost margins m_(p,q,r,s)(v|D) computed for pixels of sub-window W_(p,q,r,s) in the input images I_(t) and I_(t+1):

$\begin{matrix} {{m_{p,q,r,s}\left( v \middle| D \right)} = {\sum\limits_{x \in W_{p,q,r,s}}{m_{x}\left( v \middle| D \right)}}} & (11) \end{matrix}$

The directions of regularity for pixels of sub-window W_(p,q,r,s) of the output image Î_(τ′) are then detected from subset D_(τ′,p,q,r,s), possibly after one or more iterations of the recursive splitting of the windows.

In some embodiments, the subset D_(τ′) of selected directions can be constrained to satisfy various criteria. For example:

-   -   some particular directions (such as (0, 0, 1) typically) can be         forced to permanently stay within D_(τ′), regardless of the         margin associated with these directions;     -   the set of directions Ω can also be split into R clusters Ω⁽¹⁾,         . . . , Ω^((R)), and a constraint can be enforced that for each         cluster Ω^((r)) (1≦r≦R, R>1), only one or a limited number of         directions is selected to be included into subset D_(τ′).

The above-described embodiments may be implemented by means of software run by general-purpose microprocessors or digital signal processors, in which case the modules described above with reference to FIGS. 1-6 are understood to be or form part of software modules or routines. It may also be implemented as a hardware component as illustrated in FIG. 13, for example in an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) for interpolating a video stream, in addition to other video processing blocks 1302, 1304, before and/or after the video interpolation block 1303. Alternatively, the video processing block 1303 may implement a noise reduction method as described above. In an exemplary embodiment, the video processing blocks 1302, 1303, 1304 are implemented in a single chip 1301. The chip also has video input and output interfaces, and external RAM (random access memory) devices 1305 and 1306 as temporary storage required for the different video processing steps performed in 1302, 1303 and 1304. Other variants of this embodiment can be equally considered as part of the invention, with more complete video processing chips, or even system-on-chip devices including other functionalities. The hardware device can then be incorporated into various kinds of video apparatus.

While a detailed description of exemplary embodiments of the invention has been given above, various alternative, modifications, and equivalents will be apparent to those skilled in the art. Therefore the above description should not be taken as limiting the scope of the invention which is defined by the appended claims. 

What is claimed is:
 1. A method of analyzing an input video sequence to associate pixels of synthesized images of an output video sequence with respective directions of regularity belonging to a predefined set of directions, the method comprising: determining, from the predefined set of directions, a first subset of candidate directions for a region of a first image of the output video sequence; determining, from the predefined set of directions, a second subset of candidate directions for a corresponding region of a second synthesized image of the output video sequence following the first image, based on images of the input video sequence and the first subset of candidate directions; and detecting the respective directions of regularity for pixels of said region of the second synthesized image from the second subset of candidate directions.
 2. The method as claimed in claim 1, wherein the determination of the second subset of candidate directions comprises: detecting at least one pair of directions v_(r) and v_(a) such that v_(r) belongs to the first subset of candidate directions, v_(a) belongs to the predefined set of directions but not to the first subset, and a cost function associated with the first subset with respect to said regions of the first and second images is higher than the cost function associated with a modified subset including v_(a) and the directions of the first subset except v_(r), detecting said at least one pair of directions v_(r) and v_(a); and in response to detection of said at least one pair of directions v_(r) and v_(a), excluding v_(r) from the second subset and including v_(a) into the second subset.
 3. The method as claimed in claim 2, wherein the cost function associated with a given subset of directions is a sum, over the pixels of said regions, of minimum values of local costs for the different directions of the given subset.
 4. The method as claimed in claim 1, wherein the determination of the second subset of candidate directions comprises: evaluating first margins relating to respective contributions of the individual directions of the first subset to a cost function associated with the first subset; evaluating second margins relating to respective decrements of the cost function resulting from the addition of individual directions of the predefined set to the first subset; and substituting a direction of the predefined set for a direction of the first subset when the second margin evaluated for said direction of the predefined set exceeds the first margin evaluated for said direction of the first subset.
 5. The method as claimed in claim 4, wherein the first margin for a direction of the first subset is equal to the contribution of said direction to the cost function.
 6. The method as claimed in claim 4, wherein the second margin for a direction of the predefined set is equal to the decrement of the cost function resulting from the addition of said direction to the first subset.
 7. The method as claimed in claim 4, wherein the second margin for a direction of the predefined set is estimated as a sum of local margins over the pixels of said regions, whereby a local margin for a pixel x and a direction v is: L_(x)(D)−L_(x)(v) if v is the best direction in the whole predefined set from the point of view of minimizing a local cost at pixel position x, where L_(x)(v) designates the local cost at pixel position x for direction v and L_(x)(D) designates a minimum value of the local cost at pixel position x for the directions of the first subset; and zero else.
 8. The method as claimed in claim 4, comprising: selecting a first direction having the lowest first margin in the first subset; selecting a second direction having the highest second margin in the predefined set except the first subset; and if the second margin for the selected second direction is above the first margin for the selected first direction, excluding the selected first direction from the second subset and including the selected second direction into the second subset.
 9. The method as claimed in claim 4, comprising: selecting and sorting with increasing margins a number n of directions v₁, v₂, . . . , V_(n) having lowest first margins in the first subset; selecting and sorting with decreasing margins the n directions w₁, w₂, . . . , w_(n) having highest second margins in the predefined set except the first subset; and for each direction w_(i) of the n sorted directions w₁, w₂, . . . , w_(n) of the predefined set except the first subset, with 1≦i≦n, if the second margin is above the first margin for the corresponding direction v_(i) of the n sorted directions v₁, v₂, . . . , v_(n) of the predefined set except the first subset, excluding V_(i) from the second subset and including w_(i) into the second subset.
 10. The method as claimed in claim 4, further comprising excluding from the second subset directions of the predefined set having an evaluated margin below a preset threshold.
 11. The method as claimed in claim 4, wherein the cost function associated with a given subset of directions is a sum, over the pixels of said regions, of minimum values of local costs for the different directions of the given subset.
 12. The method as claimed in claim 1, wherein the steps of determining the second subset of candidate directions and of detecting the directions from the second subset comprise estimating local cost functions between the at least two successive images of the input video sequence having respective time positions before and after said second synthesized image of the output video sequence, and wherein the local cost functions are estimated more coarsely in the step of determining the second subset than in the step of detecting the respective directions of regularity from the second subset.
 13. The method as claimed in claim 1, wherein at least one preset direction is forced to be included into the first and second subsets.
 14. The method as claimed in claim 1, wherein the predefined set of directions is partitioned into a plurality of clusters, and in the determination of the second subset of candidate directions, one or a limited number of directions of each cluster is selected to be included into said second subset.
 15. The method as claimed in claim 1, wherein the images of the video sequences are split into a plurality of windows, wherein the second subset of candidate directions is determined for a region corresponding to the whole area of the second synthesized image, the method further comprising, for each window: determining a third subset of candidate directions as a subset of said second subset, based on cost margins determined for pixels of said window; and detecting the respective directions of regularity for pixels of said window of the second synthesized image from the third subset of candidate directions.
 16. The method as claimed in claim 15, wherein at least one of the windows is further split into a plurality of sub-windows, the method further comprising, for each sub-window of said window: determining a fourth subset of candidate directions as a subset of the third subset determined for said window, based on cost margins determined for pixels of said sub-window; and detecting the respective directions of regularity for pixels of said sub-window of the second synthesized image from the fourth subset of candidate directions.
 17. A video processing device, comprising computing circuitry arranged to analyze an input video sequence to associate pixels of synthesized images of an output video sequence with respective directions of regularity belonging to a predefined set of directions, wherein the analysis of the input video sequence comprises: determining, from the predefined set of directions, a first subset of candidate directions for a region of a first image of the output video sequence; determining, from the predefined set of directions, a second subset of candidate directions for a corresponding region of a second synthesized image of the output video sequence following the first image, based on images of the input video sequence and the first subset of candidate directions; and detecting the respective directions of regularity for pixels of said region of the second synthesized image from the second subset of candidate directions.
 18. A computer-readable medium having a program stored therein, wherein said program comprises instructions to analyze an input video sequence when said program is run in a computer processing unit, to associate pixels of synthesized images of an output video sequence with respective directions of regularity belonging to a predefined set of directions, wherein the analysis of the input video sequence comprises: determining, from the predefined set of directions, a first subset of candidate directions for a region of a first image of the output video sequence; determining, from the predefined set of directions, a second subset of candidate directions for a corresponding region of a second synthesized image of the output video sequence following the first image, based on images of the input video sequence and the first subset of candidate directions; and detecting the respective directions of regularity for pixels of said region of the second synthesized image from the second subset of candidate directions.
 19. A video processing method, comprising: receiving successive images of an input video sequence; analyzing the input video sequence to associate pixels of synthesized images of an output video sequence with respective directions of regularity; and generating the output video sequence from the input video sequence using the detected directions of regularity, wherein analyzing the input video sequence comprises: determining, from the predefined set of directions, a first subset of candidate directions for a region of a first image of the output video sequence; determining, from the predefined set of directions, a second subset of candidate directions for a corresponding region of a second synthesized image of the output video sequence following the first image, based on images of the input video sequence and the first subset of candidate directions; and detecting the respective directions of regularity for pixels of said region of the second synthesized image from the second subset of candidate directions.
 20. The method as claimed in claim 19, wherein generating the output video sequence comprises performing interpolation between successive images of the input video sequence using the detected directions of regularity.
 21. The method as claimed in claim 20, wherein the interpolation comprises video deinterlacing.
 22. The method as claimed in claim 20, wherein the interpolation comprises converting the frame rate of the input video sequence.
 23. The method as claimed in claim 19, wherein the step of generating the output video sequence comprises applying a noise reduction operation to the input video sequence using the detected directions of regularity.
 24. A video processing apparatus, comprising computing circuitry for: receiving successive images of an input video sequence; analyzing the input video sequence to associate pixels of synthesized images of an output video sequence with respective directions of regularity; and generating the output video sequence from the input video sequence using the detected directions of regularity, wherein the analysis of the input video sequence comprises: determining, from the predefined set of directions, a first subset of candidate directions for a region of a first image of the output video sequence; determining, from the predefined set of directions, a second subset of candidate directions for a corresponding region of a second synthesized image of the output video sequence following the first image, based on images of the input video sequence and the first subset of candidate directions; and detecting the respective directions of regularity for pixels of said region of the second synthesized image from the second subset of candidate directions. 